Fast and accurate short read alignment with Burrows–Wheeler transform
نویسندگان
چکیده
MOTIVATION The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. RESULTS We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows-Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is approximately 10-20x faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. AVAILABILITY http://maq.sourceforge.net.
منابع مشابه
Faster and More Accurate Sequence Alignment with SNAP
We present the Scalable Nucleotide Alignment Program (SNAP), a new short and long read aligner that is both more accurate (i.e., aligns more reads with fewer errors) and 10–100× faster than state-of-the-art tools such as BWA. Unlike recent aligners based on the Burrows-Wheeler transform, SNAP uses a simple hash index of short seed sequences from the genome, similar to BLAST’s. However, SNAP gre...
متن کاملAn Ultra-fast Approach to Align Longer Short Reads onto Human Genome
With the advent of second-generation sequencing (SGS) technologies, deoxyribonucleic acid (DNA) sequencing machines have started to produce reads, named as “longer short reads”, which are much longer than previous generation reads, the so called “short reads”. Unfortunately, most of the existing read aligners do not scale well for those second-generation longer short reads. Moreover, many of th...
متن کاملCUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform
MOTIVATION New high-throughput sequencing technologies have promoted the production of short reads with dramatically low unit cost. The explosive growth of short read datasets poses a challenge to the mapping of short reads to reference genomes, such as the human genome, in terms of alignment quality and execution speed. RESULTS We present CUSHAW, a parallelized short read aligner based on th...
متن کاملHighly Scalable Short Read Alignment with the Burrows - Wheeler Transform and Cloud Computing
Title of Document: Highly Scalable Short Read Alignment with the Burrows-Wheeler Transform and Cloud Computing Benjamin Langmead, Master of Science, 2009 Directed By: Professor Steven L. Salzberg Department of Computer Science Improvements in DNA sequencing have both broadened its utility and dramatically increased the size of sequencing datasets. Sequencing instruments are now used regularly a...
متن کاملNINJA-OPS: Fast Accurate Marker Gene Alignment Using Concatenated Ribosomes
The explosion of bioinformatics technologies in the form of next generation sequencing (NGS) has facilitated a massive influx of genomics data in the form of short reads. Short read mapping is therefore a fundamental component of next generation sequencing pipelines which routinely match these short reads against reference genomes for contig assembly. However, such techniques have seldom been a...
متن کامل